Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences.
نویسندگان
چکیده
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.
منابع مشابه
A 3 D - 1 D Substitution Matrix for Protein
In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...
متن کاملEstimation of Evolutionary Distance between Distantly Related Sequences of Amino Acids, Taking Account of Patterns of Amino Acid Replacement
When amino acid sequences are distantly related-for instance, when their identity is <0.30-it is difficult to estimate their evolutionary distance. A method called the “similarity distance method” (SD method) was developed to obtain maximum-likelihood estimates of evolutionary distance between amino acid sequences, on the basis of a given pattern of amino acid replacement. Computer simulation r...
متن کاملA 3D-1D substitution matrix for protein fold recognition that includes predicted secondary structure of the sequence.
In protein fold recognition, a probe amino acid sequence is compared to a library of representative folds of known structure to identify a structural homolog. In cases where the probe and its homolog have clear sequence similarity, traditional residue substitution matrices have been used to predict the structural similarity. In cases where the probe is sequentially distant from its homolog, we ...
متن کاملDevelopment of a new glycan score matrix
Glycans are chains of monosaccharides also known as oligosaccharides. Since glycans consist of monosaccharides having multiple hydroxyl groups which bind with potentially multiple other monosaccharides, glycans have very complicated structures compared to nucleic acid or protein sequences. The complexity is complicated further by various glycosidic linkage patterns which vary according to anome...
متن کاملThe random character of protein evolution and its effects on the reliability of phylogenetic information deduced from amino acid sequences and compositions.
Because evolution occurs by random events, the actual number of substitutions that occur in any period is not exactly equal to the number expected from the mean rate of substitution, but is statistically distributed about it. In consequence, even if rates of evolution are constant in different lineages, 'trees' deduced from descendant protein sequences contain random errors. When there are fewe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proteins
دوره 64 3 شماره
صفحات -
تاریخ انتشار 2006